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Abstract 

Targeted DNA enrichment coupled with next generation sequencing has been increasingly used for interrogation of select 
sub-genomic regions at high depth of coverage in a cost effective manner. Specificity measured by on-target efficiency is a 
key performance metric for target enrichment. Non-specific capture leads to off-target reads, resulting in waste of 
sequencing throughput on irrelevant regions. Microdroplet-PCR allows simultaneous amplification of up to thousands of 
regions in the genome and is among the most commonly used strategies for target enrichment. Here we show that 
carryover of single-stranded template genomic DNA from microdroplet-PCR constitutes a major contributing factor for off- 
target reads in the resultant libraries. Moreover, treatment of microdroplet-PCR enrichment products with a nuclease 
specific to single-stranded DNA alleviates off-target load and improves enrichment specificity. We propose that nuclease 
treatment of enrichment products should be incorporated in the workflow of targeted sequencing using microdroplet-PCR 
for target capture. These findings may have a broad impact on other PCR based applications for which removal of template 
DNA is beneficial. 
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Introduction 

Since the launch of the first commercial massively parallel 
pyrosequencing platform in 2005, next-generation sequencing 
technology has transformed genomic medicine in both basic and 
clinical research fronts [1]. The past few years have seen wide 
applications of whole exome sequencing and whole genome 
sequencing in disease gene discovery, clinical molecular diagnos- 
tics and personalized medicine [l]-[7]. However, despite the 
decreasing cost of next generation sequencing, whole exome 
sequencing and whole genome sequencing remain expensive 
especially when high depth of coverage is needed. In addition, 
significant portions of the exome or genome are not sufficiently 
sequenced and coverage gaps make the overall variant detection 
sensitivity of these technologies less than optimal at the current 
stage [8], [9]. Targeted DNA enrichment coupled with next 
generation sequencing allows interrogation of relevant genomic 
regions at high depth of coverage in a cost-effective manner and is 
well suited for applications such as molecular diagnosis of diseases 
with complex but defined genetic etiologies [10]. 

Several approaches are now available for enriching select 
regions in the genome for sequencing, each with its unique 
advantages and disadvantages [11]-[14]. Among these, micro- 
droplet polymerase chain reaction (PCR) allows simultaneous 
amplification of up to thousands of target regions through highly 
multiplexed microfluidic PCR in picoliter reaction volumes [15]. 



This straightforward enrichment approach usually yields deep and 
even coverage and is particularly well suited for capturing small 
target regions [12], [13], [16]. Amplified PCR products from 
microdroplet-PCR enrichment are usually end-repaired, concat- 
enated through ligation and then processed into platform-specific 
libraries for sequencing. This method has been widely adopted for 
both research and clinical applications by many groups including 
our own [17]-[24]. 

On-target efficiency measures how specific a capture method is 
in enriching target regions in the context of the whole genome 
[13], [18], [25]. It is always desirable to improve on-target 
efficiency for any capture method as it is directly related to how 
much data throughput is needed to achieve a certain depth of on- 
target coverage [13], [18], [25]. This becomes even more critical 
in cases such as detection of rare somatic variants in the oncology 
setting where ultra-deep on-target coverage is needed [26]-[29]. 

Here we show that carryover of single-stranded template 
genomic DNA from microdroplet-PCR contributes significandy 
to off-target reads in resultant libraries. More importantly, 
treatment of enriched DNA with the mung bean nuclease, an 
endonuclease specific to single-stranded DNA or RNA [30], can 
dramatically reduce genomic DNA carryover and increase on- 
target efficiency of the resultant library. We propose that nuclease 
treatment of enrichment products should be incorporated in the 
workflow of targeted sequencing using microdroplet-PCR for 
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enrichment. Our findings may have broad impact on other PGR 
based applications for which removal of template DNA is 
beneficial. 

Material And Methods 

Ethics Statement 

De-identified patient DNA samples left over from previous 
genetic tests were used for evaluating performance metrics for 
targeted DNA enrichment. Since the research did not involve 
interaction with either subjects or their private identifiable 
information, the use of the samples did not meet the definition 
of human subjects research as defined in 45 CFR 46.102(f) and 
therefore, IRB review and informed consent were not required 
according to the IRB policies of the Children's Hospital of 
Philadelphia. 

Target enrichment using the RainDance microdroplet- 

PCR technology 

Target regions of this study include all genomic regions covered 
by PGR amplicons spanning the coding exons of 1 1 genes involved 
in pathogenesis of Noonan spectrum disorders. Primers used to 
amplify these genes were designed using the RainDance primer 
design pipeline (Table SI). 

Target regions were captured using the RainDance microdrop- 
let-PCR technology following RainDance's protocol. Briefly, 3 [Lg 
of human genomic DNA was sheared into 2-5 kb fragments using 
the Covaris S2 (Covaris). The PGR master mix was made by 
combining 1.5 (ig of the above sheared DNA fragments, 4.7 |il of 
10 X High-Fidelity Buff'er (Life Technologies), 1.26 nl of MgS04 
(Life Technologies), 1.6 nl of 10 mM dNTP (New England 
Biolabs), 3.6 ^1 of 4 M Betaine (Sigma), 3.6 ^1 of RDT Droplet 
Stabilizer (RainDance Technologies), 1.8 jtl of DMSO (Sigma), 
0.7 nl 5 units/nl of Platinum High-FideKty Taq (Life Technolo- 
gies), and nuclease-free water to bring to a final reaction volume of 
25 The master mix was partitioned into microdroplets and 
merged on-chip with microdroplets of custom primer libraries 
(RainDance Technologies) using the RainDance 1000 (RainDance 
Technologies). The emulsion containing merged microdroplet for 
each sample was dispensed into separate PGR tubes and thermal 
cycled using the following profile: 94°C for 2 min, 55 cycles at 
94^0 for 15 s, 54°C for 15 s, 68°C for 30 s, final extension at 
68°C for 10 min, and 4°C hold. After PGR amplification, the 
emulsion was broken by adding 50 (xl of RDT 1000 Droplet 
destabUizer (RainDance Technologies), vortexing for 15 s and 
spinning at 1 3,000 xg for 5 min. The bottom oil phase was 
carefully removed from the sample and the remaining sample was 
purified using a MinElute column (Qiagen) following Qiagen's 
MinElute PGR Purification protocol. The sample was eluted in 
12 [ll of EB buffer. 1 ^1 of the elute was analyzed on a DNA 1000 
chip (Agilent) using the Bioanalyzer 2100 (Agilent) to verify 
consistency of the histogram pattern and quantify the concentra- 
tion of the enrichment products. 

Nuclease treatment of enriched RainDance PGR products 

100 ng of RainDance PGR products were mixed with 2.5 |J.l of 
10 X mung-bean nuclease buffer (New England Biolabs), 2 [d of 
10 units/|xl mung-bean nuclease (New England Biolabs) and 
nuclease-free water to bring to a final reaction volume of 25 (il. 
The digestion reaction was incubated at 30°G for 30 min. After 
nuclease treatment, DNA was purified using 37.5 |il of Agencourt 
Ampure XP beads (Beckman Goulter) following manufacturers' 
instructions and eluted in 42.5 [ll of nuclease-free water. Untreat- 



ed samples consisting of lOOng aliquots of captured DNA from the 
same batch of RainDance enrichment served as paired controls. 

Concatenation of enriched DNA, TruSeq library prep and 
sequencing using the lliumina Miseq 

For end repair, 100 ng of enrichment products, either treated or 
untreated with mung bean nuclease, were mixed with 10 [xl of 
NEBNext end repair buffer (New England Biolabs), 5 |J,1 of 
NEBNext end repair enzyme mix (New England Biolabs) and 
nuclease-free water to bring to a final reaction volume of 100 |il. 
End repair reaction was incubated at 37°G for 20 min. End- 
repaired DNA was purified using 150 |il of Agencourt Ampure XP 
beads (Beckman Coulter) following manufacturer's instructions 
and eluted in 37.5 |J,1 of nuclease-free water. For concatenation by 
Ugation, 35 ^1 of end repaired DNA was mixed with 10 |il of 5 x 
NEBNext Quick hgation buffer (New England Biolabs) and 5 [il of 
T4 DNA ligase (New England Biolabs). Ligation reaction was 
incubated at 20°G for 20 min. Concatenated DNA was purified 
using 75 (j1 of Agx'ncourt Ampure XP beads (Beckman Coulter) 
following manufacturer's instructions and eluted in 52.5 nl of 
nuclease-free water. 

50 nl of above concatenated DNA was used as the starting 
material for lUumina TruSeq sequencing library preparation using 
the TruSeq DNA LT Sample Prep Kit (lUumina) following the 
lUumina's TruSeq DNA Sample Preparation Guide. Resultant 
TruSeq libraries were quantified using the Qubit dsDNA BR kit 
(Life Technologies) and the dsDNA DIK TapeStation kit (Agilent) 
following manufacturers' instructions. TruSeq libraries were 
sequenced on MiSeq following lUumina's MiSeq System User 
Guide. Up to 16 differentially indexed libraries were pooled in 
equal molar ratio, denatured and diluted to 9 pM and then 
sequenced on the MiSeq (lUumina) using the 2x150 bp paired- 
end sequencing kit (lUumina). Nuclease-treated samples and their 
respective untreated paired controls were processed and se- 
quenced in paraUel to rule out batch-specific effects. 

Data analysis using the NextGENe software 

FASTQ, data generated on the MiSeq was analyzed using the 
NextGENe software (SoftGenetics). Briefly, FASTQ, data was first 
converted to FASTA data and aligned to the reference human 
genome hgl9. A bed file containing coordinates of the target 
regions (Table S2) was then applied to analyze on-target efficiency 
and generate coverage statistics. To detect variants potentially 
relevant to the pathogenesis of Noonan spectrum disorders from 
the next generation sequencing dataset, mutation reports were 
generated for variants within exonic and splicing regions of the 1 1 
Noonan related genes (Table S3). Parameters used in the 
NextGENe software for data analysis are available upon request. 

Results 

On-target efficiency for libraries derived from DNA 
enriched through RainDance microdroplet-PGR 

Here we define on-target efficiency as the fraction of total 
number of reads mapped to the target regions compared to the 
total number of reads mapped anywhere in the genome. We 
analyzed on-target efficiency for 3 samples enriched through 
RainDance microdroplet-PGR. As shown in table 1, on-target 
efficiency ranged from 15.9% to 34.2% with an average of 27.5%. 
These data suggest that a significant portion of sequencing reads 
align to regions in the genome other than the intended target 
regions and therefore represent a waste of sequencing throughput. 
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Table 1. On-target efficiency for libraries prepared from untreated RainDance captured DNA. 





Sample ID 


Total reads 


Aligned reads 


Reads on target 


On-target efficiency 


1 


2193674 


1 888037 


609182 


323% 


2 


2002723 


1747191 


598175 


34.2% 


3 


4516825 


4245272 


674674 


15.9% 



doi:1 0.1 371/journal.pone.Ol 03491 .tOOl 
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Figure 1. Mung bean nuclease treatment abolished the high molecular DNA smear In RainDance captured DNA after end repair. A. 

An overview of the workflow. Aliquots of lOOng of DNA enriched through microdroplet-PCR were either treated with mung-bean nuclease or 
untreated as a control. The differentially treated aliquots were end-repaired, concatenated, processed into TruSeq libraries and sequenced on IVliSeq 
in parallel to rule out any batch effect. B. Electrophoresis analysis of DNA samples on a high-sensitivity DNA chip using the Agilent 2100 Bioanalyzer. 
DNA samples in all 4 lanes were derived from 200pg of the same batch of captured DNA. From left to right, the lanes are, "L" lane-the DNA size 
ladder, lane 1 -untreated DNA enriched by RainDance microdroplet-PCR prior to end repair, lane 2- untreated DNA enriched by RainDance 
microdroplet-PCR post end repair, lane 3- mung bean nuclease treated DNA enriched by RainDance microdroplet-PCR prior to end repair, lane 4- 
mung bean nuclease treated DNA enriched by RainDance microdroplet-PCR post end repair. 
doi:1 0.1 371/journal.pone.01 03491 .gOOl 
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Table 2. On-target efficiency for libraries prepared from samples either treated or untreated with the mung bean nuclease. 



Sample ID Nuclease treatment Total reads Aligned reads Reads on target On-target efficiency Fold increase 

4 No 2040068 1888854 513149 27.2% 2.19 
Yes 2176298 1798070 1069568 59.5% 

5 No 2344430 2158492 603053 27.9% 2.09 
Yes 1847463 1499265 877131 58.5% 

6 No 4185191 3917002 759764 19.4% 3.84 
Yes 3360424 2907096 2164377 74.5% 



doi:1 0.1 371 /journal.pone.Ol 03491 .t002 

Single-Stranded genomic DNA carryover is converted to 
double stranded DNA through DNA synthesis during end 
repair 

Carryover of template genomic DNA has been previously 
reported to cause off-target reads and reduce on-target efficiency 
[15], [24]. We thus examined whether genomic DNA contami- 
nation is present in our enriched DNA samples and subsequendy 
leads to the low on-target efficiency observed in Table 1. Since 
template genomic DNA fragments are 2-5kb in size while 
enrichment amplicons primarily range from 128bp-600bp in size 
(Table SI), we expect that these two populations of DNA 
molecules should be readily distingxiishable through electrophore- 
sis on a Bioanalyzer high sensitivity DNA chip. However, we did 
not detect significant genomic DNA carryover with electrophoresis 
analysis (lane 1 in Figure IB). At first glance, this result seems to 
contradict our initial hypothesis that genomic DNA carryover is 
present in the enrichment product. However, we considered the 
possibility that template genomic DNA may have remained in the 
denatured and primarily single-stranded state even after micro- 
droplet-PCR. Since the dye in the Bioanalyzer dsDNA kit only 



specifically binds double-stranded DNA, single-stranded DNA 
molecules are thus "invisible" on the chip. 

Indeed, we observed a high molecular weight smear resembling 
sheared template genomic DNA in captured DNA after it had 
gone through end-repair (Figxire lA and Figure IB, lane 2). This is 
consistent with the possibility that single-stranded template 
genomic DNA molecules are converted into double-stranded 
DNA during the end-repair reaction by DNA polymerase. To 
further examine the hypothesis that the high molecular weight 
smear originated from single-stranded DNA, we treated captured 
DNA prior to the end repair step with mung bean nuclease, an 
endonuclease specific for single-stranded DNA or RNA [30], [31]. 
As shown in Figure IB (lane 4), nuclease treatment abolished 
majority of the high molecular weight smear in the enriched DNA 
after end repair. In contrast, enriched DNA was spared from 
mung bean nuclease digestion, suggesting that it remained double- 
stranded and thus resistant to mung bean nuclease digestion 
(Figure 1, lane 3 and 4). Taken together, these results suggest that 
at least part of genomic template DNA carryover in captured 
DNA exists as single-stranded DNA and is converted to double- 
stranded DNA during the end-repair reaction. 




untreated treated 



Figure 2. Mung bean nuclease treatment significantly increases on-target efficiency for DNA enriched through RainDance 
microdroplet-PCR. Aliquots of RainDance enriched DNA for the same sample were either treated or not treated with mung bean nuclease, 
processed into TruSeq libraries, and sequenced on MiSeq as illustrated in Figure 1 A. Plotted is the mean value of on-target efficiency of 3 samples 
that went through parallel treatments (also see Table 2). Nuclease treatment leads to significantly higher on-target efficiency (*p = 0.018, one-tail 
paired f test; error bar, SEM). 
doi:1 0.1 371 /journal.pone.Ol 03491 .g002 
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Treating captured DNA with the mung bean nuclease 

increases on-target efficiency 

If single-stranded genomic DNA carryover contributes to ofT- 
target reads in tlie resultant library, mung bean nuclease 
treatment, which selectively digests and removes single stranded 
DNA, should alleviate ofF-target reads and improve on-target 
efficiency. To test this hypothesis, we sequenced libraries prepared 
from the same sample of enriched DNA, either with or without 
mung bean nuclease treatment prior to end repair (workflow 
illustrated in Figure lA). As shown in Table 2, upon mung bean 
nuclease treatment, on-target efficiency in resultant libraries 
increased 2.1- to 3.8-fold among the 3 samples analyzed, which 
is statistically significant (Figure 2). These data further prove that 
at least part of ofF-target reads can be attributed to single-stranded 
template genomic DNA carryover from microdroplet-PCR. In 
addition, these results demonstrate that treatment of microdroplet- 
PCR enriched DNA with a nuclease specific to single stranded 
DNA, such as mung bean nuclease, is a highly effective way to 
diminish off-target reads and improve on-target efficiency in the 
resultant library. 

Mung bean nuclease treatment has no negative impact 
on variant detection 

To address the utility of nuclease treatment of captured DNA in 
applications such as detection of variants for clinical diagnostics, 
we compared mutation reports generated using the NextGENc 
software for the same sample either treated or not treated with 
mung bean nuclease. As shown in Table 3, all variants detected 
with the traditional protocol were also detected with the modified 
protocol that incorporated nuclease treatment. Moreover, as a 
result of increased on-targx't efficiency and thereb)' (■()\'eragc, 
additional variants were detected in samples 4 and 5 when treated 
with mung bean nuclease. Both of additional variants were later 
confirmed by Sanger sequencing (data not shown). These results 
suggest that mung bean nuclease treatment has no apparent 
negative impact on variant detection and instead improves 
detection of variants in the resultant library prep. 

Discussion 

Microdroplet-PCR-based enrichment is among the most 
commonly used capture methods for targeted next generation 
sequencing [12]. On-target efficiency is an important performance 
metric for measuring specificity of the target enrichment strategy. 
Lower on-target efficiency means that more sequencing through- 
put and thus higher associated cost are required to achieve the 
same depth of on-target coverage. Although some studies found 
that microdroplet-PCR can achieve high capture specificity [15], 
experience from other studies including our own suggest there is 
still much room to improve upon on-target efficiency for this 
enrichment method [18], [21]-[24], [32]. We found that 
carryover of single-stranded genomic DNA is a major contributing 
factor for off-target reads in the targeted libraries enriched using 
the microdroplet-PCR technology. Moreover, treatment of 
captured DNA with single-stranded DNA specific endonucleases 
such as the mung bean nuclease, effectively removes genomic 
DNA carryover and thereby improves on-target efficiency without 
affecting integrity of variant detection (Tables 2 and 3). 

Genomic DNA carryover has been known to adversely affect 
on-target efficiency since the early development stage of the 
microdroplet PCR technologies [15]. Here we demonstrated that 
at least part of genomic DNA carryover from RainDance 
microdroplet enrichment exists in the single-stranded state. Given 
the complexity of the template genomic DNA and its relatively 



high concentration in the final enriched product, it is conceivable 
that the single-stranded genomic DNA molecules may interact 
with each other to form short stretches of double-stranded DNA 

through low-stringency base pairing at relatively low temperatures. 
This may provide the structural basis for DNA polymerase 
mediated DNA synthesis through primer extension during end 
repair, resulting in the high molecular smear observed in the post 
end repair reaction (Figure 1). Once converted into double- 
stranded DNA, template genomic DNA carryover is processed 
together with captured DNA into the final sequencing library, 
leading to off-target reads. 

In further support of our hypothesis, treatment of DNA 
enriched through microdroplet-PCR with mung bean nuclease, 
an endonuclease specific for single stranded DNA [30], [31], 
diminished the high molecular weight smear observed in the post 
end-repair enrichment DNA. Nuclease treatment improved on- 
target efficiency in the resultant library. However, a significant 
amount of off-target reads still exist even if the enriched DNA has 
been treated with mung bean nuclease (Table 2). It is possible that 
a low level of genomic DNA carryover may anneal and form 
double stranded DNA which leads to off target reads. In addition, 
potential non-specific amplification during microdroplet-PCR 
may contribute to off target reads. Consistent with some previous 
observations [24], [32], on-target efficiency varies significantiy 
among samples enriched through microdroplet PCR in this study 
(Tables 1 and 2). This may at least partially be attributed to 
versatility of relative amount of genomic DNA carryover in 
various enriched samples. 

Mung bean nuclease digestion was previously used to remove 
PCR primers for direct sequencing of double-stranded PCR 
products without fragment purification [33]. We show here that 
template genomic DNA carryover in microdroplet-PCR products 
exists as denatured single-stranded DNA and thcr(;f()rc' can also be 
removed by mung bean nuclease digestion. Treatment of PCR 
products with mung bean nuclease may improve on-target 
efficiency for other PCR based enrichment methods, such as 
Fluidigm Access Array, when a similar post-enrichment protocol is 
used [21], [34]. One way to circumvent off-target reads derived 
from single-stranded template genomic DNA is to fuse sequencing 
platform-specific adapter sequences to locus-specific PCR primers 
and thereby generate amplicon libraries directiy from these 
primers through PCR [26], [28], [35]. However, there are 
limitations associated with such enrichment strategies including 
the need for adapter sequences and sequencing platforms to be 
preconfigured and ffxed. In addition, the size of the amplicon 
library is limited by read length of the sequencing kit [34] . 

To alleviate genomic DNA carryover in microdroplet-PCR 
enriched DNA, Tewhey et al biotinylated genomic DNA through 
nick translation and subsequentiy removed it from enrichment 
product using str(-ptavidin-coated beads [15]. Sivakumaran el al 
had also tried gel fractionation to purify RainDance enriched PCR 
products prior to end repair [24]. In comparison, nuclease 
treatment is a straightforward way to remove genomic DNA 
carryover and is fully compatible with automation of library 
preparation. We propose that nuclease treatment of DNA 
enriched through microdroplet-PCR should be incorporated into 
the workflow for sequencing fibrary preparation to improve on- 
target efficiency. 

Supporting information 

Table SI Primer sequences for microdroplet PCR and 
the length of resultant amplicons. 

PCLSX) 
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Table S2 Hgl9 genomic coordinates of regions of 
interest for calculation of on-target efBciency. 

(XLSX) 

Table S3 Hgl9 genomic coordinates of exons with 
flanking splicing sites of genes associated with Noonan 
spectrum disorders. 

(XLSX) 
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